class: center, middle, inverse, title-slide .title[ # Week 2: Geocentric models ] --- Workspace setup: ``` r library(tidyverse) library(cowplot) library(rethinking) library(patchwork) ``` # Gaussian distributions Aka, normal distributions. These distributions are unimodal and symmetric. They tend to naturally occur. And they tend to be consistent with our assumptions. (e.g., measures are continuous values on a real number line, centered around a specific value). Later in the course, we'll move away from Gaussian distributions, but they're a useful place to start. --- # Recipes for models 1. Recognize a set of variables to work with. (Data and parameters.) 2. Define each variable either in terms of the other variables OR in terms of a probability distribution. 3. The combination of variables and their probability distributions defines a **joint generative model** that can be used to simulate hypothetical observations and analyze real ones. Here's an example: `\begin{align*} y_i &\sim \text{Normal}(\mu_i,\sigma) \\ \mu_i &= \beta x_i \\ \beta &\sim \text{Normal}(0,10) \\ \sigma &\sim \text{Exponential}(1) \\ x_i &\sim \text{Normal}(0,1) \\ \end{align*}` --- ## Model for globe-tossing Here's the model for last week's globe-tossing experiment: `\begin{align*} W &\sim \text{Binomial}(N,p) \\ p &\sim \text{Uniform}(0,1) \\ \end{align*}` -- * `\(W\)` is the observed count of water. * `\(N\)` is the total number of tosses. * `\(p\)` is the proportion of water on the globe. The whole model can be read as: > The count `\(W\)` is distributed binomially with sample size `\(N\)` and probability `\(p\)`. The prior for `\(p\)` is assumed to be uniform between 0 and 1. --- ## Model for globe-tossing Here's the model for last week's globe-tossing experiment: `\begin{align*} W &\sim \text{Binomial}(N,p) \\ p &\sim \text{Uniform}(0,1) \\ \end{align*}` ### Estimating the posterior We can use grid approximation to estimate the posterior distribution. ``` r w <- 6; n <- 9 p_grid <- seq( from=0, to=1, length.out=100 ) posterior <- dbinom( w,n,p_grid )*dunif( p_grid,0,1 ) posterior <- posterior / sum(posterior) ``` --- .pull-left[ ``` r plot(p_grid, posterior, type = "l") ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-3-1.png" width="360" /> ] .pull-right[ Look familiar? ] --- ### Simulating parameters from the prior ``` r nsims = 1e4 sim_p <- runif( nsims, 0, 1) dens(sim_p) ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-4-1.png" width="504" /> --- ### Simulating observations from the prior ``` r sim_w <- rbinom( nsims, 9, sim_p) simplehist(sim_w) ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-5-1.png" width="504" /> Simulating from your priors -- **prior predictive simulation** -- is an essential part of modeling. This allows you to see what your choices imply about the data. You'll be able to diagnose bad choices. --- ### Exercise For the model defined below, simulate observed `\(y\)` values from the prior: `\begin{align*} y_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Exponential}(1) \end{align*}` -- .pull-left[ ``` r set.seed(128); nsims <- 1e4 sim_mu = rnorm( nsims, 0, 10) sim_sig = rexp( nsims, 1) sim_y = rnorm(nsims,sim_mu,sim_sig) dens(sim_y) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- ### Exercise A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. -- Remember that every student got taller each year. Does this change your choice of priors? How? -- The variance in heights for students of the same age is never more than 64 cm. Does this lead you to revise your priors? ??? `\begin{align*} y_i &\sim \mathcal{N}(\mu_i, \sigma^2) \\ \mu_i &= \beta_0 + \beta_1 x_i \\ \beta_0 &\sim \mathcal{N}(150, 50^2) \\ \beta_1 &\sim \mathcal{N}(5, 2^2) \\ \sigma &\sim \text{Half-Normal}(0, 10) \end{align*}` --- # An example: height and weight Using the Howell data (make sure you have the `rethinking` package loaded). ``` r data("Howell1") d <- Howell1 str(d) ``` ``` ## 'data.frame': 544 obs. of 4 variables: ## $ height: num 152 140 137 157 145 ... ## $ weight: num 47.8 36.5 31.9 53 41.3 ... ## $ age : num 63 63 65 41 51 35 32 27 19 54 ... ## $ male : int 1 0 0 1 0 1 0 1 0 1 ... ``` ``` r precis(d) ``` ``` ## mean sd 5.5% 94.5% histogram ## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁ ## weight 35.6106176 14.7191782 9.360721 54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁ ## age 29.3443934 20.7468882 1.000000 66.13500 ▇▅▅▃▅▂▂▁▁ ## male 0.4724265 0.4996986 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇ ``` ``` r d2 <- d[ d$age >= 18, ] ``` --- ### Exercise Write a mathematical model for the heights in this data set. (Don't predict from other variables yet.) Note that height is presented in centimeters. Simulate both your priors and the expected observed height values from the prior. -- `\begin{align*} h &\sim \text{Normal}(\mu, \sigma) \\ \mu &\sim \text{Normal}(170, 50) \\ \sigma &\sim \text{Uniform}(0, 50) \\ \end{align*}` --- Simulate your priors ``` r nsims <- 1e4 # number of simulations set.seed(128) # reproducibility sim_mu <- rnorm( nsims, 170, 20) # simulate values of mu sim_sig <- runif(nsims, 0, 50) # simulate values of sigma par(mfrow = c(1,3)) # plot display has 1 row, 3 columns dens_mu <- dens(sim_mu) # density of mu dens_sig <- dens(sim_sig) # density of sigma dens_both <- plot(sim_mu, sim_sig, cex = .5, pch = 16, col=col.alpha("#1c5253",0.1) ) # both together ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-8-1.png" width="720" /> ``` r dev.off() # turn off display settings ``` ``` ## RStudioGD ## 2 ``` --- Simulate values of height. ``` r sim_h <- rnorm( nsims, sim_mu, sim_sig) dens(sim_h) ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-10-1.png" width="504" /> ``` r PI(sim_h, .89) ``` ``` ## 5% 94% ## 115.1411 224.8321 ``` ??? 79 cm = 2.6 feet 262 cm = 8.6 feet --- Use grid approximation to calculate posterior distribution. (Not necessary to copy, just for teaching purposes.) ``` r # values of mu and sigma to test mu.list <- seq( from=150, to=190 , length.out=200 ) sigma.list <- seq( from=2 , to=18 , length.out=200 ) # fit every possible combination of m and s post <- expand.grid( mu=mu.list , sigma=sigma.list ) # calculate log-likelihood of heights for each combination of m and s post$LL <- sapply( 1:nrow(post) , function(i) sum( dnorm( d2$height , post$mu[i] , post$sigma[i] , log=TRUE ) ) ) # add priors post$prod <- post$LL + dnorm( post$mu , 170 , 20 , TRUE ) + dunif( post$sigma , 0 , 50 , TRUE ) # convert from LL to p post$prob <- exp( post$prod - max(post$prod) ) ``` --- ``` r post %>% ggplot(aes(x = mu, y = sigma, color = prob)) + geom_point() + scale_color_gradient(low = "white", high = "#1c5253") + theme_cowplot() ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-12-1.png" width="432" /> --- Go back to your code and change the range of values you estimate. ``` r # values of mu and sigma to test mu.list <- seq( from=152, to=157 , length.out=100 ) sigma.list <- seq( from=7 , to=9 , length.out=100 ) ``` Rerun all the earlier code. --- ``` r post %>% ggplot(aes(x = mu, y = sigma, color = prob)) + geom_point() + scale_color_gradient(low = "white", high = "#1c5253") + theme_cowplot() ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-15-1.png" width="432" /> --- Cool, but we said last week that grid approximation is unwieldy and going to quickly become unmanageable. So let's repeat this process with **quadratic approximation.** We won't be calculating the probability or likelihood of values directly (too costly), but we can make some assumptions about the shapes of distributions and get an approximation of the shape of the posterior. ``` r flist <- alist( height ~ dnorm( mu , sigma ) , mu ~ dnorm( 178 , 20 ) , sigma ~ dunif( 0 , 50 ) ) m4.1 <- quap( flist , data=d2 ) precis( m4.1 ) ``` ``` ## mean sd 5.5% 94.5% ## mu 154.607013 0.4119939 153.94857 155.265458 ## sigma 7.731319 0.2913847 7.26563 8.197008 ``` These numbers provide Gaussian approximations for each parameter’s **marginal distribution**. This means the plausibility of each value of `\(\mu\)`, after averaging over the plausibilities of each value of `\(\sigma\)`, is given by a Gaussian distribution with mean 154.6 and standard deviation 0.4. ??? The function `alist()` does not evaluate the code, whereas the code `list()` does. Our interest in quadratic approximation, recall, is as a handy way to quickly make inferences about the shape of the posterior. The posterior’s peak will lie at the MAXIMUM A POSTERIORI estimate (MAP), and we can get a useful image of the posterior’s shape by using the quadratic approximation of the posterior distribution at this peak. To build the quadratic approximation, we’ll use quap, a command in the rethinking package. The quap function works by using the model definition you were introduced to earlier in this chapter. Each line in the definition has a corresponding definition in the form of R code. The engine inside quap then uses these definitions to define the posterior probability at each combination of parameter values. Then it can climb the posterior distribution and find the peak, its MAP. Finally, it estimates the quadratic curvature at the MAP to produce an approximation of the posterior distribution. Remember: This procedure is very similar to what many non-Bayesian procedures do, just without any priors. --- `quap()` has approximated a **multivariate** Gaussian distribution -- more than one parameter, and these parameters may be related. ``` r vcov( m4.1 ) ``` ``` ## mu sigma ## mu 0.1697389710 0.0002177928 ## sigma 0.0002177928 0.0849050288 ``` ``` r diag( vcov( m4.1 ) ) ``` ``` ## mu sigma ## 0.16973897 0.08490503 ``` ``` r cov2cor( vcov( m4.1 ) ) ``` ``` ## mu sigma ## mu 1.000000000 0.001814203 ## sigma 0.001814203 1.000000000 ``` --- You can extract samples. ``` r post <- extract.samples( m4.1 , n=1e4 ) head(post) ``` ``` ## mu sigma ## 1 154.9757 7.922596 ## 2 153.9188 7.931269 ## 3 154.8100 7.903120 ## 4 153.9765 7.798361 ## 5 155.0161 7.414213 ## 6 154.9462 7.334470 ``` ``` r precis(post) ``` ``` ## mean sd 5.5% 94.5% histogram ## mu 154.599639 0.4153199 153.927287 155.258969 ▁▁▁▅▇▂▁▁ ## sigma 7.733434 0.2947761 7.259968 8.205634 ▁▁▁▂▅▇▇▃▁▁▁▁ ``` --- ## Adding in a linear component We might assume that weight and height are associated with each other. Indeed, within our sample: ``` r plot(d2$height ~ d2$weight) ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-19-1.png" width="504" /> --- ### Exercise Update your mathematical model to incorporate weight. Simulate from your priors to see the implied regression lines. -- `\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (x_i - \bar{x}) \\ \alpha &\sim \text{Normal}(170, 50) \\ \beta &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Uniform}(0, 50) \\ \end{align*}` ??? `\(=\)` is deterministic -- once we know other variables, `\(\mu_i\)` is known with certainty made-up parameters are the targets of learning --- To simulate from our priors: ``` r # simulate 100 lines nsims <- 100 sim_alpha = rnorm(nsims, 170, 50) sim_beta = rnorm(nsims, 0, 10) # calculate weight xbar = mean(d2$weight) # plot with nothing in it plot(NULL, xlim = range(d2$weight), ylim = c(-100, 400), xlab = "weight", ylab = "height") abline(h = 0, lty = 2) #line at 0 abline(h = 272, lty = 2) #line at world's tallest person #plot each line for(i in 1:nsims){ curve(sim_alpha[i] +sim_beta[i]*(x-xbar), add = T, col=col.alpha("#1c5253",0.4)) } ``` --- <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-20-1.png" width="720" /> --- Describe in words what's wrong with our priors. -- Slope should not be negative. How can we fix this? -- Could use a uniform distribution bounded by 0. --- ``` r # simulate 100 lines nsims <- 100 sim_alpha = rnorm(nsims, 170, 50) sim_beta = runif(nsims, 0, 10) # calculate weight xbar = mean(d2$weight) # plot with nothing in it plot(NULL, xlim = range(d2$weight), ylim = c(-100, 400), xlab = "weight", ylab = "height") abline(h = 0, lty = 2) #line at 0 abline(h = 272, lty = 2) #line at world's tallest person #plot each line for(i in 1:nsims){ curve(sim_alpha[i] +sim_beta[i]*(x-xbar), add = T, col=col.alpha("#1c5253",0.4)) } ``` --- <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-21-1.png" width="720" /> --- ### Exercise Fit the new height model using the quadratic approximation. -- ``` r flist <- alist( height ~ dnorm( mu , sigma ) , mu <- a + b*(weight - mean(weight)), a ~ dnorm( 170 , 50 ) , b ~ dunif(0, 10), sigma ~ dunif( 0 , 50 ) ) m2 <- quap( flist , data=d2 ) precis( m2 ) ``` ``` ## mean sd 5.5% 94.5% ## a 154.5975429 0.2703274 154.1655074 155.029578 ## b 0.9050292 0.0419279 0.8380203 0.972038 ## sigma 5.0718661 0.1911531 4.7663665 5.377366 ``` --- Simulate lines from this distribution. ``` r sample_post = extract.samples(m2, n = 100) plot(d2$weight, d2$height, cex = .5, pch = 16, xlim = range(d2$weight), ylim = c(-100, 400), xlab = "weight", ylab = "height") abline(h = 0, lty = 2) #line at 0 abline(h = 272, lty = 2) #line at world's tallest person #plot each line for(i in 1:nrow(sample_post)){ curve(sample_post$a[i] +sample_post$b[i]*(x-xbar), add = T, col=col.alpha("#1c5253",0.1)) } ``` <img src="data:image/png;base64,#lecture02-1_files/figure-html/unnamed-chunk-23-1.png" width="504" />